Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-50887: Mask value using a masking config #227

Merged
merged 28 commits into from
Oct 10, 2024

Conversation

siddhant2001
Copy link
Contributor

@siddhant2001 siddhant2001 commented Sep 27, 2024

  • Write tests and verify correctness

Copy link

github-actions bot commented Sep 27, 2024

Test Results

330 tests  +5   330 ✅ +5   11s ⏱️ ±0s
 42 suites ±0     0 💤 ±0 
 42 files   ±0     0 ❌ ±0 

Results for commit d38e08c. ± Comparison against base commit e092fbf.

This pull request removes 72 and adds 18 tests. Note that renamed tests count towards both.
            long: 3600000
            valueType: LONG
          columnName: "SERVICE.startTime"
          long: 1570658506605
          long: 1727521612551
          long: 1727521614904
          long: 1727521615242
          long: 1727521633079
          long: 1727525212711
          long: 1727525214904
…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.spanTags"
      }
    }
    operator: CONTAINS_KEY
    rhs {
      literal {
        value {
          string: "span.kind"
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.spanTags"
        subpath: "span.kind"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          string: "client"
        }
      }
    }
  }
}
selection {
  at…, 10, server
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "API_TRACE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556715813
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "API_TRACE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560315969
        }
      }
    }
  }
  …
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "BACKEND.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718115
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "BACKEND.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318115
        }
      }
    }
  }
  chil…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718415
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318415
        }
      }
    }
  }
  chil…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "API_TRACE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556715813
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "API_TRACE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560315969
        }
      }
  …
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "BACKEND.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718115
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "BACKEND.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318115
        }
      }
    }
…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.startTime"
      }
    }
    operator: GT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1570658506605
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 2570744906673
        }
      }
    }
  }
…, 2, server
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "SERVICE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718415
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "SERVICE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318415
        }
      }
    }
…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [3] selection {
  function {
    functionName: "sum"
    arguments {
      attribute_expression {
        attributeId: "EVENT.spanTags"
        subpath: "otel.status_code"
      }
    }
  }
}
, 1, 0.0
org.hypertrace.core.query.service.htqueries.HTPostgresQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556736851
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560336851
        }
      }
    }
  }
  chil…
…

♻️ This comment has been updated with latest results.

tenantScopedMaskingCriteria = [
{
"tenantId": "testTenant",
"timeRangeAndMaskValues": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we can have multiple TimeRange conditions, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each timeRange condition will have it's own masking values

@@ -94,6 +94,27 @@ service.config = {
# ]
# }
# ]
tenantScopedMaskingCriteria = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's comment it in main application.conf file here. As a default, there is no masking applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes done, you've reviewed an older commit

public class HandlerScopedMaskingConfig {
private static final String TENANT_SCOPED_MASKS_CONFIG_KEY = "tenantScopedMaskingCriteria";
private final Map<String, List<MaskValuesForTimeRange>> tenantToMaskValuesMap;
private HashMap<String, Boolean> shouldMaskAttribute = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is key in both these maps - shouldMaskAttribute and maskedValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeId is the key to both.
Removing shouldMaskAttribute as it is not needed.


private static boolean isTimeRangeOverlap(
MaskValuesForTimeRange timeRangeAndMasks, Instant queryStartTime, Instant queryEndTime) {
boolean timeRangeOverlap = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be false, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following conditionals check for no overlap, i.e. they set the timeRangeOverlap to false. This statement is correct.

timeRangeOverlap = false;
}

Instant endTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getStartTimeMillis().get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be getEndTimeMillis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, is this what we are looking as function?

if (timeRangeAndMasks.getStartTimeMillis().isPresent()) {
      
      Instant startTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getStartTimeMillis().get());
      Instant endTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getEndTimeMillis().get());

      if (!(startTimeInstant.isAfter(queryEndTime) || endTimeInstant.isBefore(queryStartTime))) {
        return true;
      }
    }
    return false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed the condition.

}

public void parseColumns(ExecutionContext executionContext) {
shouldMaskAttribute.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the state is maintained per request, but we should only test against the timeRange condition.

Pre-compute using config:

  • tenantId -> List of timeRanges
  • timeRange -> set of attributes
  • timeRange -> map(attributeId, maskValue)

During response processing:

  • Check if any time range matches.
  • Pick the first match (or should we apply UNION?).
    • If UNION is used, and the same attribute is present in two time ranges with different mask values, which one should we consider? I guess any Should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I've done it.
In case of attribute in multiple time ranges, I choose any value.

return Observable.fromIterable(rowBuilderList)
.map(Builder::build)
// .map(row -> handlerScopedMaskingConfig.mask(row))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

# "tenantId": "testTenant",
# "timeRangeAndMaskValues": [
# {
# "startTimeMillis": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should take the timestamp as mandatory.

  • if startTime or endTime missing -> log an warn stating that the filter will be ignored.

if (indexToLogicalName.containsKey(colIdx)) {
return indexToLogicalName.get(colIdx);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove additional space.

@@ -149,4 +157,12 @@ String getDataFromRow(int rowIndex, String logicalName) {
}
return result;
}

String getLogicalNameFromColIdx(Integer colIdx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Lets see if we can use Optional as return type.

@@ -58,6 +59,10 @@ public class PinotBasedRequestHandler implements RequestHandler {
private static final String START_TIME_ATTRIBUTE_NAME_CONFIG_KEY = "startTimeAttributeName";
private static final String SLOW_QUERY_THRESHOLD_MS_CONFIG = "slowQueryThresholdMs";

private static final String MASKED_VALUE = "*";
// This is how empty list is represented in Pinot
private static final String PINOT_EMPTY_LIST = "[\"\"]";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PINOT_EMPTY_LIST -> ARRAY_TYPE_MASKED_VALUE
MASKED_VALUE-> DEFAULT_MASKED_VALUE

public List<String> getMaskedAttributes(ExecutionContext executionContext) {
String tenantId = executionContext.getTenantId();
List<String> maskedAttributes = new ArrayList<>();
// maskedValue.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this.

this.tenantId = config.getString(TENANT_ID_CONFIG_KEY);
this.maskValues =
config.getConfigList(TIME_RANGE_AND_MASK_VALUES_CONFIG_KEY).stream()
.map(MaskValuesForTimeRange::new)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter out the empty maskings

@@ -536,7 +549,11 @@ private void handleSelection(
for (String logicalName : selectedAttributes) {
// colVal will never be null. But getDataRow can throw a runtime exception if it failed
// to retrieve data
String colVal = resultAnalyzer.getDataFromRow(rowId, logicalName);
String colVal =
!maskedAttributes.contains(logicalName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : get rid of the ! and invert to simplify

@@ -493,21 +500,26 @@ private Filter rewriteLeafFilter(
return queryFilter;
}

Observable<Row> convert(ResultSetGroup resultSetGroup, LinkedHashSet<String> selectedAttributes) {
Observable<Row> convert(ResultSetGroup resultSetGroup, ExecutionContext executionContext) {
LinkedHashSet<String> selectedAttributes = executionContext.getSelectedColumns();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this also inside resultSetGroup.getResultSetCount() > 0?

for (int colIdx = 0; colIdx < resultSet.getColumnCount(); colIdx++) {
for (int colIdx = 0, logicalColIdx = 0;
colIdx < resultSet.getColumnCount();
colIdx++, logicalColIdx++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need logicalColIdx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple map fields. When creating the idxToLogical name map, a map field would only increment the counter by one. Here each map field increments colIdx by 2. That's why I have created a new variable, which is incremented only once even if we go through a maps 2 columns (key and value)

@@ -149,4 +158,8 @@ String getDataFromRow(int rowIndex, String logicalName) {
}
return result;
}

Optional<String> getLogicalNameFromColIdx(Integer colIdx) {
return Optional.ofNullable(indexToLogicalName.get(colIdx));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what scenario, can it be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it shouldn't be

@kotharironak kotharironak self-requested a review October 10, 2024 11:28
kotharironak
kotharironak previously approved these changes Oct 10, 2024
Copy link
Contributor

@kotharironak kotharironak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siddhant2001 siddhant2001 merged commit cd60ebb into main Oct 10, 2024
6 checks passed
@siddhant2001 siddhant2001 deleted the mask-unbofuscated-cookies branch October 10, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants